Vl1w processor with power saving

ABSTRACT

A data processing apparatus has an instruction memory system arranged to output an instruction word, capable of containing a plurality of instructions, respective instruction words being output in response to respective instruction addresses. An instruction execution unit contains a plurality of functional units, each capable of executing a respective instruction from the instruction word in parallel with execution of other instructions from the instruction word by other ones of the functional units. A power saving circuit is provided to switch a selectable subset of the functional units and/or parts of the instruction memory to a power saving state, while other functional units and parts of the instruction memory continue processing instructions in a normal power consuming state. The power saving circuit selects the functional units and/or parts of the instruction memory dependent on program execution.

The invention relates to a data processing apparatus, such as a VLIW(Very Long Instruction Word) processor, that is capable of executing aplurality of instructions from an instruction word in parallel.

A VLIW processor makes it possible to execute programs with a highdegree of instruction parallelism. Conventionally, in each instructioncycle the VLIW processor fetches an instruction word that contains afixed number, greater than one, of instructions (often calledoperations). The VLIW processor executes these operations in parallel inthe same instruction cycle (or cycles). For this purpose the VLIWprocessor contains a plurality of functional units, each capable ofexecuting one of the operations from the instruction word at a time.Different kinds of functional units are typically provided, such asALU's (arithmetic logics units), multipliers, branch control units,memory access units etc. Often dedicated purpose functional units arealso included, designed to speed up programs for a particularapplications. Thus, for example, functional units for performing partsof MPEG encoding or decoding may be added.

In advanced VLIW processors hundreds of functional units may be present.In principle, the instruction word may contain instructions for all ofthese functional units in parallel. Often the functional units areorganized into groups of one or more functional unit, an instructionword providing one instruction per group. When at least some of thegroups contain more than one functional unit grouping limits the lengthof the instruction word, without reducing the number of functionalunits.

All functional units inevitably consume power supply current. When aVLIW processor contains many functional units that operate in parallel,therefore, considerable power consumption occurs. This is inconsistentwith requirements for battery-operated apparatuses. It may also increasethe cost of cooling measures needed to operate the VLIW processor in asingle package, due to the heating associated with power consumption.

U.S. Pat. No. 5,815,725 describes the use of clock gating to reducepower consumption in a microprocessor. A monitor circuit monitorswhether the microprocessor enters a low activity operational state andif so it gates clock signals to the microprocessor. In U.S. Pat. No.5,815,725 the clock gating involves disabling the clock signal in onlypart of the clock cycles, because the microprocessor must continue tooperate. U.S. Pat. No. 5,661,751 describes clock gating during which theclock signal to a peripheral of a microprocessor (a UART) is completelydisabled. Similarly, U.S. Pat. No. 6,345,336 describes disabling ofclock signals of part of a cache memory.

Clock gating reduces power consumption but when applied to theinstruction execution part of a processor it has the disadvantage thatit reduces the capability of executing instructions. Significantly, U.S.Pat. No. 5,661,751 and U.S. Pat. No. 6,345,336 apply clock gating toperipheral or auxiliary circuits and not to the instruction executioncircuit or the whole instruction memory. U.S. Pat. No. 5,815,725attempts to mitigate the problem of complete disabling of the clocksignal of the microprocessor by disabling the clock signal only in partof the clock cycles. Nevertheless the rate of instruction execution isreduced.

Among others, it is an object of the invention to provide for a dataprocessing apparatus which uses power saving measures during executionof instructions to reduce power supply consumption without reducing therate at which instructions can be executed.

The invention provides for a data processing apparatus according toclaim 1. This data processing apparatus is of a type, such as a VLIWprocessor, that processes instruction words that each contain aplurality of instructions. Different functional units execute theinstructions from an instruction word in parallel. According to theinvention the processing apparatus is constructed so that it is madepossible to apply power saving measures, such as clock gating,selectively to part of the functional units and/or memory units thatsupply instructions to respective ones of the functional units or groupsof functional units, dependent on program execution. In the memory unitsin particular much power can be saved.

The invention is based on the insight that there exist usefulapplication programs in which the utilization of functional units variesfrom one program section to another. In such applications it can bedetermined in advance which functional units will be used in whichsection. For example, in a program that involves MPEG encoding,specialized functional units for specific tasks in such encoding areonly used in specific sections. When the processor executes instructionwords from a program section power saving may be used to disable clocksignals of the functional units and/or memory units that are known notto be used in that section.

When the instruction word contains a field dedicated for instructionsfor a functional unit in which power saving measures are applied, theapparatus may automatically also apply power saving measures to thesection of instruction memory that provides that field when clock gatingis applied to that functional unit. More generally, the processingapparatus may apply power saving measures to any resources, such as aregister file or peripheral circuits, that are dedicated to thefunctional unit to which power saving measures are applied.

It has also been discovered that in many useful application programs theutilization of different functional units is correlated. In a programsection where one functional unit is not used, certain correlatedfunctional units are not used either. Therefore it is advantageous tocombine such functional units into a group and to arrange the circuit sothat clock gating disables clock signals to all functional units in thegroup. When the group contains no functional units that are used in aprogram section, clock gating can be used to disable clock signals toall of the functional units of the group. Moreover, when resources areshared per group of processors clock gating can also be applied to theresources.

These and other objects and advantageous aspects of the processingapparatus and method of processing according to the invention will bedescribed in more detail with reference to FIG. 1, which shows aprocessing apparatus.

FIG. 1 shows a processing apparatus that contains a memory system 10,with memory units 12 a-g, a controller 14, and an instruction executionunit 7 that contains groups 70 a-g of functional units 18 a-c, aregister file 72 and an instruction address counter unit 74. Instructionaddress counter unit 74 has an instruction address output coupled tocontroller 14. Controller 14 has selection outputs 16 coupled to memoryunits 12 a-g and to groups 70 a-g. Furthermore, controller 14 hasaddress outputs coupled to memory units 12 a-g. Memory units 12 a-g haveinstruction outputs coupled to respective ones of groups 70 a-g and toregister file 72. Register file has operand/result output/input ports(not shown separately) coupled to groups 70 a-g. Groups 70 a-g eachcontain one or more functional unit 18 a-c (the functional units of onlyone group being shown explicitly), which all have clock gating inputscoupled to the selection outputs 16 of controller 14, operation codeinputs coupled to memory units 12 a-g, operand inputs coupled toregister file 72 and result outputs coupled to register file 72 (allexcept the clock gating inputs being symbolized by a single connectionbetween memory units 12 a-g, groups 70 a-g of functional units 18 a-cand the register file 72.). One of groups 70 a-g has a branch addressoutput coupled to instruction address counter unit 74.

In operation the processing apparatus operates in successive instructioncycles. In successive instruction cycles address counter unit 74 outputsaddresses of successive instructions to controller 14 (theseinstructions will be called “successive” because the correspondinginstructions are executed successively, although in the case of branchesthe addresses may not be successive). Controller outputs furtherinstruction addresses derived from the instruction address to memoryunits 12 a-g. The further instruction addresses address instructionmemory locations in memory units 12 a-g. Memory units 12 a-g outputinstructions from the addresses to instruction execution unit 7. Thecombination of instructions output from memory units 12 a-g forms aninstruction word with fields for the various instructions.

Controller 14 also outputs selection signals which are applied to thememory units 12 a-g. Each selection signal indicates whether aninstruction from a respective memory unit 12 a-g is needed for thecurrent instruction cycle. When the selection signal indicates that noinstruction is needed from a particular memory unit 12 a-g the memoryunit is switched to a power saving state, for example by disabling clocksignals in that particular memory unit 12 a-g. These clock signalsinclude for example the clock signal that signals the output driver ofthe memory unit to change the instruction output from the particularmemory unit 12 a-g, or the clock signal used to precharge bit linesand/or word lines etc. When these clock signals are disabled power issaved, for example because no charging current for outputs, bit linesand/or word lines is needed. Other ways of saving power includedisconnecting a power supply source from circuits that need not retain astate during power saving.

Each group 70 a-g of functional units 18 a-c receives an instructionfrom a respective one of memory units 12 a-g and the selection signalthat is applied to that memory unit 12 a-c. The selection signalcontrols whether the group of functional units is switched to a powersaving state, for example by disabling clock signals in the functionalunits 18 a-c in groups 70 a-g. The disabled clock signals include forexample clock signals that cause logic transitions in the output signalsfrom output drivers of functional units 18 a-c, or clock signalsinvolved in precharging signal lines. Also, some functional unit containdata memory that consumes less power when the clock is disabled. Whenthese clock signals are disabled power is saved, for example because nocharge current for outputs or signal lines is needed.

In those groups where the selection signal does not indicate that clocksignals should be disabled, the functional units 18 a-c of the group 70a-g determine which of the functional units 18 a-c of the group 70 a-gshould execute the instruction from the corresponding memory unit 12a-g, and that functional unit reads operands addressed by theinstruction from register file 72 (if any) and supplies results toregister file 72 (if any).

Although it is preferred that clocks are disabled both in cooperatingmemory units 12 a-g and in groups 70 a-g, it will be understood that apower advantage is already gained when clock signals are disabled inonly one of them.

Controller 14 is capable of selecting and deselecting memory units 12a-g and/or groups 70 a-g independently of other memory units 12 a-gand/or groups 70 a-g. Selection may be controlled in various ways. Inone embodiment memory mapping information is used that is loaded into acontrol memory (not shown) in controller 14 prior to execution of aprogram of instruction words from memory units 12 a-g. In this case thememory mapping information indicates for a number of address ranges ofinstruction addresses from instruction address counter unit 74 which ofthe selection signals should be activated. When controller 14 receivesan instruction address from instruction address counter unit 74 itdetects the address range that contains the instruction address andsupplies the selection signals stored for that address.

In another embodiment subsequent switching off or on of selectedselection signals is commanded from the instruction words that areexecuted by execution unit 7. For this purpose a special selectioncontrol functional unit may be provided in one of the groups 70 a-c,that executes instructions which contain indications of the groups 70a-g that should receive selection signals. Such an instruction may forexample be in the form of a mask with respective bits for differentgroups, to indicate whether or not the group should be selected or not,or in the form of numbers that indicate a group whose selection shouldbe activated or deactivated. Thus, different subsets of (groups of)functional units in which clock signals are to be disabled can beselected. In an extremely simple embodiment, wherein clock signals canbe disabled only in one such subset, the command need not specify thesubset.

Although FIG. 1 shows that all groups 70 a-g receive selection signals,it will be understood that the invention is not limited to use ofselection signals for all groups. In practice controller 14 may not havea selection output for some of the groups 70 a-g and these some of thegroups may not have a selection input. Thus, these groups are alwaysactive. Preferably, at least one group is always active. Also, althougheach group is shown to receive its own independently settable selectionsignal, it will be understood that in practice some groups may receive ashared selection signal. Furthermore, although all groups have beenshown without distinctions, it will be understood that the groups may infact differ: functional units in some groups may receive literal data,such as branch addresses or constants from memory units 12 a-g, whereasothers merely receive operation codes, data being supplied from registerfile 72, some groups may receive larger numbers of operands than others,or produce larger numbers of results.

As shown, one of the groups 70 a-g has a connection from a branchfunctional unit (not shown) to update the instruction address ininstruction address counter unit 74 in response to an instruction. Thebranch functional unit executes this update for example when itdetermines that some condition has been met. Updates may be absolute(replacement of program counter value in address counter unit 74) orrelative (addition to the program counter value). This is shown by wayof example. In practice more than one group 70 a-g may contain one ormore branch functional units coupled to instruction address counter unit74.

Furthermore, although separate memory units 12 a-g have been shown forrespective groups of functional units 70 a-g, it will be understood thatsome groups may share a memory unit 12 a-g, so that the memory unitproduces instructions for these groups in parallel (in general thesememory units will have wider instruction output than other ones ofmemory units 12 a-g). Of course, clock signals are disabled in such amemory unit, if at all, only when none of the groups of functional units70 a-g that is connected to the memory units 12 a-g needs aninstruction. This can be implemented using a detector to determinewhether none of the relevant groups of functional units needs aninstruction, or it may be indicated by instructions from the program.

Furthermore, in some designs register file 72 may be split into a numberof register files; some of which are coupled only to a subset of groups70 a-g of functional units 18 a-c, sometimes even only to one group 70a-g, in which case that register file can be regarded as part of therelevant group. In the latter case, power saving may be applied to theregister file that is only connected to one of the groups 70 a-g that isnot currently selected, for example by disabling clock signals in thatregister file. When more than one group has access to a register filepower saving may be applied to that register file when the selectionsignals from controller 14 disable clock signals in all of the groupsthat have access to the register file. Controller 14 may be providedwith a separate selection output for this register file for thispurpose, so that power saving in the register file can explicitly becontrolled. Alternatively, a detection circuit may be provided to detectwhether the selection signals of all involved groups 70 a-g signal thatpower saving should be applied and if so the detection circuit signalsthat power saving should be applied to the register file as well.

In practice the processing apparatus may use pipelining of instructionexecution. That is, in the same instruction cycle controller 14 mayprocess one instruction address, memory units 12 a-g may retrieveinstructions for a preceding instruction address and functional units 18a-c may process one or more processing stages for one or more yetfurther preceding instruction address. In this case, power saving ormore particularly disabling of clock signals may also be pipelined, forexample by delaying the selection signals from controller 14 bydifferent numbers of instruction cycles for memory units 12 a-g anddifferent pipeline stages of functional units 18 a-c.

Prior to program execution, it should preferably be determined whichprogram parts need which groups 70 a-g of functional units. This is amatter of taking account of specialized functions of the functionalunits 18 a-c, but it may also depend on the different required amountsof parallelism in different parts of the program. For example, a higheramount of parallelism may be needed inside an inner loop.

Programming of the data processing apparatus starts with determinationof a description of the operations that have to be performed, forexample compiled from a program in a high level computer language.Subsequently, a step is performed to map the operations to functionalunits. This mapping step allows for some mapping freedom. For example,some arithmetic and logic operations could be performed sequentially onone arithmetic logic functional unit, or in parallel on differentarithmetic logic functional units. During the mapping step, an innerloop and surrounding parts of the program may be identified (which areexecuted many times and only once or a few times, respectively, eachtime when the program is executed). In this case, the operations of theinner loop are preferably mapped to allow parallel execution indifferent functional units, whereas operations from the surroundingparts are preferably mapped to one or a limited subset of the functionalunits, using sequential execution. Moreover, during the mapping stepsome operations can only be mapped to specific functional units or agroup of functional units. Certain MPEG encoding or encoding functionsare examples of this.

In a selection step, the combinations of (groups of) functional unitsthat are used in respective sections of the program are identified andinformation is compiled that indicates which combinations are used inwhich sections. This information is subsequently used during executionof the program to disable clock signals selectively in those (groups of)functional units that are not used in a section when instructions fromthe section are executed, for example in the form of memory mappinginformation or in the form of commands to disable or enable clocksignals in selected functional units.

1. A data processing apparatus, the apparatus comprising an instructionmemory system arranged to output an instruction word, capable ofcontaining a plurality of instructions, respective instruction wordsbeing output in response to respective instruction addresses; aninstruction execution unit, comprising a plurality of functional units,each capable of executing a respective instruction from the instructionword in parallel with execution of other instructions from theinstruction word by other ones of the functional units; a power savingcircuit arranged to switch a selectable subset of the functional unitsand/or parts of the instruction memory that supply instructions from theinstruction word to the functional units to a power saving state duringprogram execution, the power saving circuit being arranged to select thefunctional units and/or parts of the instruction memory in the subsetdependent on program execution.
 2. A data processing apparatus accordingto claim 1, wherein clock signals to the functional units and/or partsof the instruction memory in the subset are disabled in said powersaving state.
 3. A data processing apparatus according to claim 1,wherein the functional units are organized into groups of one or morefunctional units each, the functional unit or units in each respectivegroup receiving instructions from a respective instruction field in theinstruction word, each time for execution by one of the functional unitsin the group, the power saving circuit selecting the functional unitsthat are switched to the power saving state per group.
 4. A dataprocessing apparatus according to claim 1, wherein the instructionmemory system comprises a plurality of memory units, each for supplyinga respective instruction field in the instruction word for aninstruction for a respective functional unit or group of functionalunits, the clock gating circuit being arranged to switch those memoryunits to the power saving state that supply the instruction field thatfor the selectable ones of the functional units that are switched to thepower saving state.
 5. A data processing apparatus according to claim 4,wherein the memory units each comprise memory locations for at least apart of each instruction words only for instruction words in arespective range of instruction addresses, the instruction memory systemallowing for partial overlap of the respective ranges of different onesof the memory units.
 6. A data processing apparatus according to claim 1wherein the power saving circuit is arranged to select the subsetdependent an instruction address associated with the instruction word.7. A data processing apparatus according to claim 1 wherein the powersaving circuit is arranged to select the subset under control of one ormore instructions contained in a program executed by the data processingapparatus.
 8. A data processing apparatus according to claim 7 whereinsaid one or more instructions specify the subset.
 9. A method ofexecuting a program of instructions using a data processing apparatusaccording to claim 1, the method comprising identifying a part of theprogram wherein the instruction word does not contain instructions forfunctional units in a particular one of the groups, and using the powersaving circuit to switch to the power saving state the functional unitsthat not contained in the particular one of the groups and/or memoryunits that are coupled to the particular one of the groups, duringexecuting of said identified part of the program.